70 research outputs found
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Vision Transformers (ViTs) have shown impressive performance and have become
a unified backbone for multiple vision tasks. But both attention and
multi-layer perceptions (MLPs) in ViTs are not efficient enough due to dense
multiplications, resulting in costly training and inference. To this end, we
propose to reparameterize the pre-trained ViT with a mixture of multiplication
primitives, e.g., bitwise shifts and additions, towards a new type of
multiplication-reduced model, dubbed , which aims for
end-to-end inference speedups on GPUs without the need of training from
scratch. Specifically, all among queries, keys, and values
are reparameterized by additive kernels, after mapping queries and keys to
binary codes in Hamming space. The remaining MLPs or linear layers are then
reparameterized by shift kernels. We utilize TVM to implement and optimize
those customized kernels for practical hardware deployment on GPUs. We find
that such a reparameterization on (quadratic or linear) attention maintains
model accuracy, while inevitably leading to accuracy drops when being applied
to MLPs. To marry the best of both worlds, we further propose a new mixture of
experts (MoE) framework to reparameterize MLPs by taking multiplication or its
primitives as experts, e.g., multiplication and shift, and designing a new
latency-aware load-balancing loss. Such a loss helps to train a generic router
for assigning a dynamic amount of input tokens to different experts according
to their latency. In principle, the faster experts run, the larger amount of
input tokens are assigned. Extensive experiments consistently validate the
effectiveness of our proposed ShiftAddViT, achieving up to
\textbf{5.18\times} latency reductions on GPUs and \textbf{42.9%} energy
savings, while maintaining comparable accuracy as original or efficient ViTs.Comment: Accepted by NeurIPS 202
NetBooster: Empowering Tiny Deep Learning By Standing on the Shoulders of Deep Giants
Tiny deep learning has attracted increasing attention driven by the
substantial demand for deploying deep learning on numerous intelligent
Internet-of-Things devices. However, it is still challenging to unleash tiny
deep learning's full potential on both large-scale datasets and downstream
tasks due to the under-fitting issues caused by the limited model capacity of
tiny neural networks (TNNs). To this end, we propose a framework called
NetBooster to empower tiny deep learning by augmenting the architectures of
TNNs via an expansion-then-contraction strategy. Extensive experiments show
that NetBooster consistently outperforms state-of-the-art tiny deep learning
solutions
NetDistiller: Empowering Tiny Deep Learning via In-Situ Distillation
Boosting the task accuracy of tiny neural networks (TNNs) has become a
fundamental challenge for enabling the deployments of TNNs on edge devices
which are constrained by strict limitations in terms of memory, computation,
bandwidth, and power supply. To this end, we propose a framework called
NetDistiller to boost the achievable accuracy of TNNs by treating them as
sub-networks of a weight-sharing teacher constructed by expanding the number of
channels of the TNN. Specifically, the target TNN model is jointly trained with
the weight-sharing teacher model via (1) gradient surgery to tackle the
gradient conflicts between them and (2) uncertainty-aware distillation to
mitigate the overfitting of the teacher model. Extensive experiments across
diverse tasks validate NetDistiller's effectiveness in boosting TNNs'
achievable accuracy over state-of-the-art methods. Our code is available at
https://github.com/GATECH-EIC/NetDistiller
Experimental study on the mechanical controlling factors of fracture plugging strength for lost circulation control in shale gas reservoir
The geological conditions of shale reservoir present several unique challenges. These include the extensive development of multi-scale fractures, frequent losses during horizontal drilling, low success rates in plugging, and a tendency for the fracture plugging zone to experience repeated failures. Extensive analysis suggests that the weakening of the mechanical properties of shale fracture surfaces is the primary factor responsible for reducing the bearing capacity of the fracture plugging zone. To assess the influence of oil-based environments on the degradation of mechanical properties in shale fracture surfaces, rigorous mechanical property tests were conducted on shale samples subsequent to their exposure to various substances, including white oil, lye, and the filtrate of oil-based drilling fluid. The experimental results demonstrate that the average values of the elastic modulus and indwelling hardness of dry shale are 24.30 GPa and 0.64 GPa, respectively. Upon immersion in white oil, these values decrease to 22.42 GPa and 0.63 GPa, respectively. Additionally, the depth loss rates of dry shale and white oil-soaked shale are determined to be 57.12% and 61.96%, respectively, indicating an increased degree of fracturing on the shale surface. White oil, lye, and the filtrate of oil-based drilling fluid have demonstrated their capacity to reduce the friction coefficient of the shale surface. The average friction coefficients measured for white oil, lye, and oil-based drilling fluid are 0.80, 0.72, and 0.76, respectively, reflecting their individual weakening effects. Furthermore, it should be noted that the contact mode between the plugging materials and the fracture surface can also lead to a reduction in the friction coefficient between them. To enhance the bearing capacity of the plugging zone, a series of plugging experiments were conducted utilizing high-strength materials, high-friction materials, and nanomaterials. The selection of these materials was based on the understanding of the weakened mechanical properties of the fracture surface. The experimental results demonstrate that the reduced mechanical properties of the fracture surface can diminish the pressure-bearing capacity of the plugging zone. However, the implementation of high-strength materials, high-friction materials, and nanomaterials effectively enhances the pressure-bearing capacity of the plugging zone. The research findings offer valuable insights and guidance towards improving the sealing pressure capacity of shale fractures and effectively increasing the success rate of leakage control measures during shale drilling and completion. © 2023 The Author
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention During Vision Transformer Inference
Vision Transformers (ViTs) have shown impressive performance but still
require a high computation cost as compared to convolutional neural networks
(CNNs), one reason is that ViTs' attention measures global similarities and
thus has a quadratic complexity with the number of input tokens. Existing
efficient ViTs adopt local attention (e.g., Swin) or linear attention (e.g.,
Performer), which sacrifice ViTs' capabilities of capturing either global or
local context. In this work, we ask an important research question: Can ViTs
learn both global and local context while being more efficient during
inference? To this end, we propose a framework called Castling-ViT, which
trains ViTs using both linear-angular attention and masked softmax-based
quadratic attention, but then switches to having only linear angular attention
during ViT inference. Our Castling-ViT leverages angular kernels to measure the
similarities between queries and keys via spectral angles. And we further
simplify it with two techniques: (1) a novel linear-angular attention
mechanism: we decompose the angular kernels into linear terms and high-order
residuals, and only keep the linear terms; and (2) we adopt two parameterized
modules to approximate high-order residuals: a depthwise convolution and an
auxiliary masked softmax attention to help learn both global and local
information, where the masks for softmax attention are regularized to gradually
become zeros and thus incur no overhead during ViT inference. Extensive
experiments and ablation studies on three tasks consistently validate the
effectiveness of the proposed Castling-ViT, e.g., achieving up to a 1.8% higher
accuracy or 40% MACs reduction on ImageNet classification and 1.2 higher mAP on
COCO detection under comparable FLOPs, as compared to ViTs with vanilla
softmax-based attentions.Comment: CVPR 202
ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design
Vision Transformers (ViTs) have achieved state-of-the-art performance on
various vision tasks. However, ViTs' self-attention module is still arguably a
major bottleneck, limiting their achievable hardware efficiency. Meanwhile,
existing accelerators dedicated to NLP Transformers are not optimal for ViTs.
This is because there is a large difference between ViTs and NLP Transformers:
ViTs have a relatively fixed number of input tokens, whose attention maps can
be pruned by up to 90% even with fixed sparse patterns; while NLP Transformers
need to handle input sequences of varying numbers of tokens and rely on
on-the-fly predictions of dynamic sparse attention patterns for each input to
achieve a decent sparsity (e.g., >=50%). To this end, we propose a dedicated
algorithm and accelerator co-design framework dubbed ViTCoD for accelerating
ViTs. Specifically, on the algorithm level, ViTCoD prunes and polarizes the
attention maps to have either denser or sparser fixed patterns for regularizing
two levels of workloads without hurting the accuracy, largely reducing the
attention computations while leaving room for alleviating the remaining
dominant data movements; on top of that, we further integrate a lightweight and
learnable auto-encoder module to enable trading the dominant high-cost data
movements for lower-cost computations. On the hardware level, we develop a
dedicated accelerator to simultaneously coordinate the enforced denser/sparser
workloads and encoder/decoder engines for boosted hardware utilization.
Extensive experiments and ablation studies validate that ViTCoD largely reduces
the dominant data movement costs, achieving speedups of up to 235.3x, 142.9x,
86.0x, 10.1x, and 6.8x over general computing platforms CPUs, EdgeGPUs, GPUs,
and prior-art Transformer accelerators SpAtten and Sanger under an attention
sparsity of 90%, respectively.Comment: Accepted to HPCA 202
- …